A Learned Index for Exact Similarity Search in Metric Spaces
نویسندگان
چکیده
Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index , which replaces or complements traditional index structures with machine learning models, has been actively explored reduce storage and search costs. However, accurate similarity high-dimensional metric spaces remains be open challenge. In this paper, we propose a novel indexing approach called LIMS that uses data clustering, pivot-based transformation techniques learned indexes spaces. LIMS, underlying partitioned into clusters such each cluster follows relatively uniform distribution. Data redistribution achieved by utilizing small number pivots for cluster. Similar are mapped compact regions values totally ordinal. Machine models developed approximate position record on disk. Efficient algorithms designed range queries nearest neighbor based maintenance dynamic updates. Extensive experiments real-world synthetic datasets demonstrate superiority compared state-of-the-art indexes.
منابع مشابه
On Index-Free Similarity Search in Metric Spaces
Metric access methods (MAMs) serve as a tool for speeding similarity queries. However, all MAMs developed so far are index-based; they need to build an index on a given database. The indexing itself is either static (the whole database is indexed at once) or dynamic (insertions/deletions are supported), but there is always a preprocessing step needed. In this paper, we propose D-file, the first...
متن کاملSimilarity Search in Metric Spaces
Similarity search refers to any searching problem which retrieves objects from a set that are close to a given query object as re ected by some similarity criterion. It has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. In this thesis, we examine algorithms designed for similarity search over arbitrar...
متن کاملScalable Similarity Search in Metric Spaces
Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure. By exploiting par...
متن کاملA Content-Addressable Network for Similarity Search in Metric Spaces
Because of the ongoing digital data explosion, more advanced search paradigms than the traditional exact match are needed for contentbased retrieval in huge and ever growing collections of data produced in application areas such as multimedia, molecular biology, marketing, computer-aided design and purchasing assistance. As the variety of data types is fast going towards creating a database uti...
متن کاملMRoute: A Peer-to-Peer Routing Index for Similarity Search in Metric Spaces
Similarity search for content-based retrieval (where content can be any combination of text, image, audio/video, etc.) has gained importance in recent years, also because of the advantage of ranking the retrieved results according to their proximity to a query. However, to use similarity search in real world applications, we need to tackle the problem of huge volumes of such mixed multimedia da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2022
ISSN: ['1558-2191', '1041-4347', '2326-3865']
DOI: https://doi.org/10.1109/tkde.2022.3206441